Quantitative Big Imaging

Kevin Mader
14 April 2016

ETHZ: 227-0966-00L

Groups of Objects and Distributions

Course Outline

  • 25th February - Introduction and Workflows
  • 3rd March - Image Enhancement (A. Kaestner)
  • 10th March - Basic Segmentation, Discrete Binary Structures
  • 17th March - Advanced Segmentation
  • 24th March - Analyzing Single Objects
  • 7th April - Analyzing Complex Objects
  • 14th April - Spatial Distribution
  • 21st April - Statistics and Reproducibility
  • 28th April - Dynamic Experiments
  • 12th May - Scaling Up / Big Data
  • 19th May - Guest Lecture - High Content Screening
  • 26th May - Guest Lecture - Machine Learning / Deep Learning and More Advanced Approaches
  • 2nd June - Project Presentations

Literature / Useful References

Books

  • Jean Claude, Morphometry with R
  • John C. Russ, “The Image Processing Handbook”,(Boca Raton, CRC Press)
    • Available online within domain ethz.ch (or proxy.ethz.ch / public VPN)
  • J. Weickert, Visualization and Processing of Tensor Fields

Papers / Sites

  • Voronoi Tesselations

    • Ghosh, S. (1997). Tessellation-based computational methods for the characterization and analysis of heterogeneous microstructures. Composites Science and Technology, 57(9-10), 1187–1210
    • Wolfram Explanation
  • Self-Avoiding / Nearest Neighbor

    • Schwarz, H., & Exner, H. E. (1983). The characterization of the arrangement of feature centroids in planes and volumes. Journal of Microscopy, 129(2), 155–169.
    • Kubitscheck, U. et al. (1996). Single nuclear pores visualized by confocal microscopy and image processing. Biophysical Journal, 70(5), 2067–77.
  • Alignment / Distribution Tensor

    • Mader, K. et al (2013). A quantitative framework for the 3D characterization of the osteocyte lacunar system. Bone, 57(1), 142–154
    • Aubouy, M., et al. (2003). A texture tensor to quantify deformations. Granular Matter, 5, 67–70. Retrieved from http://arxiv.org/abs/cond-mat/0301018
  • Two point correlation

    • Dinis, L., et. al. (2007). Analysis of 3D solids using the natural neighbour radial point interpolation method. Computer Methods in Applied Mechanics and Engineering, 196(13-16)

Previously on QBI ...

  • Image Enhancment
    • Highlighting the contrast of interest in images
    • Minimizing Noise
  • Understanding image histograms
  • Automatic Methods
  • Component Labeling
  • Single Shape Analysis
  • Complicated Shapes

Outline

  • Motivation (Why and How?)
  • Local Environment
    • Neighbors
    • Voronoi Tesselation
    • Distribution Tensor
  • Global Enviroment
    • Alignment
    • Self-Avoidance
    • Two Point Correlation Function

Metrics

We examine a number of different metrics in this lecture and additionally to classifying them as Local and Global we can define them as point and voxel-based operations.

Point Operations

  • Nearest Neighbor
  • Delaunay Triangulation
    • Distribution Tensor
  • Point (Center of Volume)-based Voronoi Tesselation
  • Alignment
x y z
0 1 1
3 1 1
1 1 0
1 1 2

Voxel Operation

  • Voronoi Tesselation
  • Neighbor Counting
  • 2-point (N-point) correlation functions

plot of chunk unnamed-chunk-3

What do we start with?

Going back to our original cell image

  1. We have been able to get rid of the noise in the image and find all the cells (lecture 2-4)
  2. We have analyzed the shape of the cells using the shape tensor (lecture 5)
  3. We even separated cells joined together using Watershed (lecture 6)

We can characterize the sample and the average and standard deviations of volume, orientation, surface area, and other metrics

Motivation (Why and How?)

With all of these images, the first step is always to understand exactly what we are trying to learn from our images.

All Cells

  1. We want to know how many cells are alive

    • Maybe small cells are dead and larger cells are alive \( \rightarrow \) examine the volume distribution
    • Maybe living cells are round and dead cells are really spiky and pointy \( \rightarrow \) examine anisotropy
  2. We want to know where the cells are alive or most densely packed

    • We can visually inspect the sample (maybe even color by volume)
    • We can examine the raw positions (x,y,z) but what does that really tell us?
    • We can make boxes and count the cells inside each one
    • How do we compare two regions in the same sample or even two samples?

Motivation (continued)

All Cells

  1. We want to know how the cells are communicating
    • Maybe physically connected cells (touching) are communicating \( \rightarrow \) watershed
    • Maybe cells oriented the same direction are communicating \( \rightarrow \) average? orientation
    • Maybe cells which are close enough are communicating \( \rightarrow \) ?
    • Maybe cells form hub and spoke networks \( \rightarrow \) ?

Motivation (continued)

All Cells

  1. We want to know how the cells are nourished
    • Maybe closely packed cells are better nourished \( \rightarrow \) count cells in a box?
    • Maybe cells are oriented around canals which supply them \( \rightarrow \) ?

So what do we still need

  1. A way for counting cells in a region and estimating density without creating arbitrary boxes
  2. A way for finding out how many cells are near a given cell, it's nearest neighbors
  3. A way for quantifying how far apart cells are and then comparing different regions within a sample
  4. A way for quantifying and comparing orientations

What would be really great?

A tool which could be adapted to answering a large variety of problems

  • multiple types of structures
  • multiple phases

Destructive Measurements

With most imaging techniques and sample types, the task of measurement itself impacts the sample.

  • Even techniques like X-ray tomography which claim to be non-destructive still impart significant to lethal doses of X-ray radition for high resolution imaging
  • Electron microscopy, auto-tome-based methods, histology are all markedly more destructive and make longitudinal studies impossible
  • Even when such measurements are possible
    • Registration can be a difficult task and introduce artifacts

Why is this important?

  • techniques which allow us to compare different samples of the same type.
  • are sensitive to common transformations
    • Sample B after the treatment looks like Sample A stretched to be 2x larger
    • The volume fraction at the center is higher than the edges but organization remains the same

Ok, so now what?

Smaller Region

\[ \downarrow \]

x y vx vy
20.19 10.69 -0.95 -0.30
20.19 10.69 0.30 -0.95
293.08 13.18 -0.50 0.86
293.08 13.18 -0.86 -0.50
243.81 14.23 0.68 0.74
243.81 14.23 -0.74 0.68

\[ \cdots \]

So if we want to know the the mean or standard deviations of the position or orientations we can analyze them easily.

Min. 1st Qu. Median Mean 3rd Qu. Max.
x 6.90 215.70 280.50 258.20 339.00 406.50
y 10.69 111.60 221.00 208.60 312.50 395.20
Length 1.06 1.57 1.95 2.08 2.41 4.33
vx -1.00 -0.94 -0.70 -0.42 0.07 0.71
vy -1.00 -0.70 0.02 0.04 0.71 1.00
Theta -180.00 -134.10 -0.50 -4.67 130.60 177.70
  • But what if we want more or other information?

Simple Statistics

When given a group of data, it is common to take a mean value since this is easy. The mean bone thickness is 0.3mm. This is particularly relevant for groups with many samples because the mean is much smaller than all of the individual points.

but means can lie

  • the mean of 0\( ^\circ \) and 180\( ^\circ \) = 90\( ^\circ \)
  • the distance between -180\( ^\circ \) and 179\( ^\circ \) is 359\( ^\circ \)
  • since we have not defined a tip or head, 0\( ^\circ \) and 180\( ^\circ \) are actually the same

some means are not very useful

plot of chunk unnamed-chunk-10

Calculating Density

One of the first metrics to examine with distribution is density \( \rightarrow \) how many objects in a given region or volume.

It is deceptively easy to calculate involving the ratio of the number of objects divided by the volume. Grid Nearest Neighbor

It doesn't tell us much, many very different systems with the same density and what if we want the density of a single point? Does that even make sense? Grid Nearest Neighbor

Neighbors

Definition

Oxford American \( \rightarrow \) be situated next to or very near to (another)

  • Does not sound very scientific
  • How close?
    • Touching, closer than anything else?

Nearest Neighbor (distance)

Given a set of objects with centroids at \[ \textbf{P}=\begin{bmatrix} \vec{x}_0,\vec{x}_1,\cdots,\vec{x}_i \end{bmatrix} \]

We can define the nearest neighbor as the position of the object in our set which is closest

\[ \vec{\textrm{NN}}(\vec{y}) = \textrm{argmin}(||\vec{y}-\vec{x}|| \forall \vec{x} \in \textbf{P}-\vec{y}) \]

We define the distance as the Euclidean distance from the current point to that point, and the angle as the

\[ \textrm{NND}(\vec{y}) = \textrm{min}(||\vec{y}-\vec{x}|| \forall \vec{x} \in \textbf{P}-\vec{y}) \] \[ \textrm{NN}\theta(\vec{y}) = \tan^{-1}\frac{(\vec{\textrm{NN}}-\vec{y})\cdot \vec{j}}{(\vec{\textrm{NN}}-\vec{y})\cdot \vec{i}} \]

Nearest Neighbor Definition

So examining a simple starting system like a grid, we already start running into issues.

  • In a perfect grid like structure each object has 4 equidistant neighbors (6 in 3D)
  • Which one is closest?

We thus add an additional clause (only relevant for simulated data) where if there are multiple equidistant neighbors, a nearest is chosen randomly

This ensures when we examine the orientation distribution (NN\( \theta \)) of the neighbors it is evenly distributed

Grid Nearest Neighbor

In-Silico Systems

For the rest of these sections we will repeatedly use several simple in-silico systems to test our methods and try to better understand the kind of results we obtain from them.

  • Compression

    • The most simple system simply involves a scaling in every direction by \( \alpha \)
    • \( \alpha<1 \) the system is compressed
    • \( \alpha>1 \) the system is expanded

    \[ \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} = \alpha \begin{bmatrix} x \\ y \end{bmatrix} \]

  • Shearing

    • Slightly more complicated system where objects are shifted based on their location using a slope of \( \alpha \)

    \[ \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} = \begin{bmatrix} 1 & \alpha \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} \]

  • In-Silico Systems (Continued)

    • Stretch
      • A non-evenly distributed system with a parameter \( \alpha \) controlling if objects bunch near the edges or the center. A maximum distance \( m \) is defined as the magnitude of the largest offset in x or y
      • + Same total volume just arranged differently

    \[ \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} = \begin{bmatrix} \textrm{sign}(x) \left(\frac{|x|}{m}\right)^\alpha m \\ \textrm{sign}(y) \left(\frac{|y|}{m}\right)^\alpha m \end{bmatrix} \]

    • Swirl
      • A transformation where the points are rotated more based on how far away they are from the center and the slope of the swirl (\( \alpha \)),

    \[ \theta (x,y) = \alpha \sqrt{x^2+y^2} \]

    \[ \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} = \begin{bmatrix} \cos\theta(x,y) & -\sin\theta(x,y) \\ \sin\theta(x,y) & \cos\theta(x,y) \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} \]

    Examining Compression

    Uniaxially Stretched

    Grid Nearest Neighbor

    Compression Distributions

    Length Distribution

    NN Orientation

    Examining Different Shears

    Uniaxially Stretched

    Grid Nearest Neighbor

    Shear Distributions

    Length Distribution

    NN Orientation

    Examining Different Stretches

    Uniaxially Stretched

    Grid Nearest Neighbor

    Stretch Distributions

    Length Distribution

    NN Orientation

    Examining Swirl Systems

    Grid Nearest Neighbor

    Swirl NN Distributions

    Length Distribution

    NN Orientation

    What we notice

    We notice there are several fairly significant short-comings of these metrics (particularly with in-silico systems)

    1. Orientation appears to be useful but random
      • Why should it matter if one side is 0.01% closer?
    2. Single outlier objects skew results
    3. We only extract one piece of information
    4. Difficult to create metrics
      • Fit a peak to the angle distribution and measure the width as the “angle variability”?

    Luckily we are not the first people to address this issue

    Random Systems

    Using a uniform grid of points as a starting point has a strong influence on the results. A better approach is to use a randomly distributed series of points

    • resembles real data much better
    • avoids these symmetry problems
      • \( \epsilon \) sized edges or overlaps
      • identical distances to nearby objects

    Maximum Stretch

    Examining Compression

    Uniaxially Stretched

    Grid Nearest Neighbor

    Compression Distributions

    Length Distribution

    NN Orientation

    Examining Different Shears

    Uniaxially Stretched

    Grid Nearest Neighbor

    Shear Distributions

    Length Distribution

    NN Orientation

    Examining Different Stretches

    Uniaxially Stretched

    Grid Nearest Neighbor

    Stretch Distributions

    Length Distribution

    NN Orientation

    Examining Swirl Systems

    Grid Nearest Neighbor

    Swirl NN Distributions

    Length Distribution

    NN Orientation

    Voronoi Tesselation

    Voronoi tesselation is a method for partitioning a space based on points. The basic idea is that each point \( \vec{p} \) is assigned a region \( \textbf{R} \) consisting of points which are closer to \( \vec{p} \) than any of the other points. Below the diagram is shown in a dashed line for the points shown as small circles.